Improved bound on the worst case complexity of Policy Iteration
نویسندگان
چکیده
Solving Markov Decision Processes (MDPs) is a recurrent task in engineering. Even though it is known that solutions for minimizing the infinite horizon expected reward can be found in polynomial time using Linear Programming techniques, iterative methods like the Policy Iteration algorithm (PI) remain usually the most efficient in practice. This method is guaranteed to converge in a finite number of steps. Unfortunately, it is known that it may require an exponential number of steps in the size of the problem to converge. On the other hand, many open questions remain considering the actual worst case complexity. In this work, we provide the first improvement over the fifteen years old upper bound from Mansour & Singh (1999) by showing that PI requires at most k k−1 · kn n + o ( k n n ) iterations to converge, where n is the number of states of the MDP and k is the maximum number of actions per state. Perhaps more importantly, we also show that this bound is optimal for an important relaxation of the problem.
منابع مشابه
An improved infeasible interior-point method for symmetric cone linear complementarity problem
We present an improved version of a full Nesterov-Todd step infeasible interior-point method for linear complementarityproblem over symmetric cone (Bull. Iranian Math. Soc., 40(3), 541-564, (2014)). In the earlier version, each iteration consisted of one so-called feasibility step and a few -at most three - centering steps. Here, each iteration consists of only a feasibility step. Thus, the new...
متن کاملAn Interior Point Algorithm for Solving Convex Quadratic Semidefinite Optimization Problems Using a New Kernel Function
In this paper, we consider convex quadratic semidefinite optimization problems and provide a primal-dual Interior Point Method (IPM) based on a new kernel function with a trigonometric barrier term. Iteration complexity of the algorithm is analyzed using some easy to check and mild conditions. Although our proposed kernel function is neither a Self-Regular (SR) fun...
متن کاملImproved Strong Worst-case Upper Bounds for MDP Planning
The Markov Decision Problem (MDP) plays a central role in AI as an abstraction of sequential decision making. We contribute to the theoretical analysis of MDP planning, which is the problem of computing an optimal policy for a given MDP. Specifically, we furnish improved strong worstcase upper bounds on the running time of MDP planning. Strong bounds are those that depend only on the number of ...
متن کاملA path following interior-point algorithm for semidefinite optimization problem based on new kernel function
In this paper, we deal to obtain some new complexity results for solving semidefinite optimization (SDO) problem by interior-point methods (IPMs). We define a new proximity function for the SDO by a new kernel function. Furthermore we formulate an algorithm for a primal dual interior-point method (IPM) for the SDO by using the proximity function and give its complexity analysis, and then we sho...
متن کاملA POLYNOMIAL TIME BRANCH AND BOUND ALGORITHM FOR THE SINGLE ITEM ECONOMIC LOT SIZING PROBLEM WITH ALL UNITS DISCOUNT AND RESALE
The purpose of this paper is to present a polynomial time algorithm which determines the lot sizes for purchase component in Material Requirement Planning (MRP) environments with deterministic time-phased demand with zero lead time. In this model, backlog is not permitted, the unit purchasing price is based on the all-units discount system and resale of the excess units is possible at the order...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Oper. Res. Lett.
دوره 44 شماره
صفحات -
تاریخ انتشار 2016